09. Action-Value Functions

Action-Value Functions

Note : In this course, we will use "return" and "discounted return" interchangably. For an arbitrary time step t , both refer to G_t \doteq R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \ldots = \sum_{k=0}^\infty \gamma^k R_{t+k+1} , where \gamma \in [0,1] . In particular, when we refer to "return", it is not necessarily the case that \gamma = 1 , and when we refer to "discounted return", it is not necessarily true that \gamma < 1 . ( This also holds for the readings in the recommended textbook. )